models
通过创建一个继承 BaseModel
的子类(model)来定义一类对象(defining objects)
模型(model)可以是强类型语言中的类型(types),也可以是 API 接口中的参数
未经验证的 raw data 可以被传入 model,model 会根据自身的的字段定义对 untrusted data 进行验证
pydantic 不仅仅是一个验证模块(validation library),更是一个解析模块(parsing library),验证只提供对传入数据的类型与约束,而 pydantic 可以保证传出数据的类型与约束都是符合规则的。
pydantic is primarily a parsing library, not a validation library. Validation is a means to an end: building a model which conforms to the types and constraints provided.
In other words, pydantic guarantees the types and constraints of the output model, not the input data
基础使用
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'Jane Doe' # has default value, not required then
name
字段的类型被它的默认值所指出,所以就不用显式地定义了
user = User(id='123') # str was cast to an int
user_x = User(id='123.45') # float was cast to an int
聪明的实例化
assert user.__fields_set__ == {'id'}
__fields_set__
指出了在实例化时传入的字段
<model>.dict()
和 dict(<model>)
能将 model 转换为字典;不过 <model>.dict()
可能会包含一些其他数据(比如递归解析出嵌套的 dict 字段)
model 常用的方法与属性
属性
- ==
model_fields
==- model 定义字段的集合
model_computed_fields
- model 的计算属性 dict
model_fields_set
- model 实例化时传入的字段
方法
model_dump()
- 返回字典
- 原来的
dict()
已弃用
model_dump_json()
- 返回 json 字符串
model_copy()
- 返回拷贝(默认浅拷贝 shallow copy)
- ==
model_validate()
==- 曾用
parse_obj()
- class method,将一个 dict 转换为 model 的方法
- 曾用
model_validate_json()
- 曾用
parse_raw()
- class method,将字符串转换为 model 的方法
- 曾用
model_construct()
- class method,生成一个跳过校验的实例
嵌套 model
from typing import Optional
from pydantic import BaseModel
class Foo(BaseModel):
count: int
size: Optional[float] = None
class Bar(BaseModel):
apple = 'x'
banana = 'y'
class Spam(BaseModel):
foo: Foo
bars: list[Bar]
m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
print(m)
#> foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'),
#> Bar(apple='x2', banana='y')]
print(m.dict())
"""
{
'foo': {'count': 4, 'size': None},
'bars': [
{'apple': 'x1', 'banana': 'y'},
{'apple': 'x2', 'banana': 'y'},
],
}
"""
ORM Mode
models 支持 ORM 模式
from typing import List
from sqlalchemy import Column, Integer, String
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.orm import declarative_base
from typing_extensions import Annotated
from pydantic import BaseModel, ConfigDict, StringConstraints
Base = declarative_base()
class CompanyOrm(Base):
__tablename__ = 'companies'
id = Column(Integer, primary_key=True, nullable=False)
public_key = Column(String(20), index=True, nullable=False, unique=True)
name = Column(String(63), unique=True)
domains = Column(ARRAY(String(255)))
class CompanyModel(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: int
public_key: Annotated[str, StringConstraints(max_length=20)]
name: Annotated[str, StringConstraints(max_length=63)]
domains: List[Annotated[str, StringConstraints(max_length=255)]]
co_orm = CompanyOrm(
id=123,
public_key='foobar',
name='Testing',
domains=['example.com', 'foobar.com'],
)
print(co_orm)
#> <__main__.CompanyOrm object at 0x0123456789ab>
co_model = CompanyModel.model_validate(co_orm)
print(co_model)
"""
id=123 public_key='foobar' name='Testing' domains=['example.com', 'foobar.com']
"""
要点
Config
属性orm_mode
必须为True
- 使用
.from_orm()
类方法来创建 model
保留字
You may want to name a Column
after a reserved SQLAlchemy field. In that case, Field
aliases will be convenient:
import typing
import sqlalchemy as sa
from sqlalchemy.orm import declarative_base
from pydantic import BaseModel, ConfigDict, Field
class MyModel(BaseModel):
model_config = ConfigDict(from_attributes=True)
metadata: typing.Dict[str, str] = Field(alias='metadata_')
Base = declarative_base()
class SQLModel(Base):
__tablename__ = 'my_table'
id = sa.Column('id', sa.Integer, primary_key=True)
# 'metadata' is reserved by SQLAlchemy, hence the '_'
metadata_ = sa.Column('metadata', sa.JSON)
sql_model = SQLModel(metadata_={'key': 'val'}, id=1)
pydantic_model = MyModel.model_validate(sql_model)
print(pydantic_model.model_dump())
#> {'metadata': {'key': 'val'}}
print(pydantic_model.model_dump(by_alias=True))
#> {'metadata_': {'key': 'val'}}
嵌套的属性
When using attributes to parse models, model instances will be created from both top-level attributes and deeper-nested attributes as appropriate.
from typing import List
from pydantic import BaseModel, ConfigDict
class PetCls:
def __init__(self, *, name: str, species: str):
self.name = name
self.species = species
class PersonCls:
def __init__(self, *, name: str, age: float = None, pets: List[PetCls]):
self.name = name
self.age = age
self.pets = pets
class Pet(BaseModel):
model_config = ConfigDict(from_attributes=True)
name: str
species: str
class Person(BaseModel):
model_config = ConfigDict(from_attributes=True)
name: str
age: float = None
pets: List[Pet]
bones = PetCls(name='Bones', species='dog')
orion = PetCls(name='Orion', species='cat')
anna = PersonCls(name='Anna', age=20, pets=[bones, orion])
anna_model = Person.model_validate(anna)
print(anna_model)
"""
name='Anna' age=20.0 pets=[Pet(name='Bones', species='dog'), Pet(name='Orion', species='cat')]
"""
错误处理
校验过程中,一旦出错,pydantic 将会抛出 ValidationError
错误
from typing import List
from pydantic import BaseModel, ValidationError
class Model(BaseModel):
list_of_ints: List[int]
a_float: float
data = dict(
list_of_ints=['1', 2, 'bad'],
a_float='not a float',
)
try:
Model(**data)
except ValidationError as e:
print(e)
"""
2 validation errors for Model
list_of_ints
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='bad', input_type=str]
a_float
Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not a float', input_type=str]
"""
解析数据的辅助函数
pydantic 为 model 提供两个 helper 类方法供其在 parsing data 时使用
model_validate()
- 跟
__init__
方法很像,只不过它接收一个 dict 而不是关键字参数(说实话,在__init__
里用**kwargs
传字典也能起到相同的作用)
- 跟
model_validate_json()
- str / bytes 都能传,它首先会转换成 json,然后再把 json 传进
model_validate()
- str / bytes 都能传,它首先会转换成 json,然后再把 json 传进
from datetime import datetime
from typing import Optional
from pydantic import BaseModel, ValidationError
class User(BaseModel):
id: int
name: str = 'John Doe'
signup_ts: Optional[datetime] = None
m = User.model_validate({'id': 123, 'name': 'James'})
print(m)
#> id=123 name='James' signup_ts=None
try:
User.model_validate(['not', 'a', 'dict'])
except ValidationError as e:
print(e)
"""
1 validation error for User
Input should be a valid dictionary or instance of User [type=model_type, input_value=['not', 'a', 'dict'], input_type=list]
"""
m = User.model_validate_json('{"id": 123, "name": "James"}')
print(m)
#> id=123 name='James' signup_ts=None
try:
m = User.model_validate_json('{"id": 123, "name": 123}')
except ValidationError as e:
print(e)
"""
1 validation error for User
name
Input should be a valid string [type=string_type, input_value=123, input_type=int]
"""
try:
m = User.model_validate_json('invalid JSON')
except ValidationError as e:
print(e)
"""
1 validation error for User
Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='invalid JSON', input_type=str]
"""
如果你想转换其他格式的数据,那先把它转成 dict,再调用 model_validate()
么好了
如果你在 model_validate()
里传了一个 model 实例,除非你设置了 revalidate_instances
,否则不会再验证一遍了:
from pydantic import BaseModel, ConfigDict, ValidationError
class Model(BaseModel):
a: int
model_config = ConfigDict(revalidate_instances='always')
m = Model(a=0)
# note: the `model_config` setting validate_assignment=True` can prevent this kind of misbehavior
m.a = 'not an int'
try:
m2 = Model.model_validate(m)
except ValidationError as e:
print(e)
"""
1 validation error for Model
a
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not an int', input_type=str]
"""
跳过验证,创建 model
construct()
方法允许你不经验证就创建 model,当信息源绝对可靠时,这样做可以提高效率(==不验证比验证快 30 倍==)
from pydantic import BaseModel
class User(BaseModel):
id: int
age: int
name: str = 'John Doe'
original_user = User(id=123, age=32)
user_data = original_user.model_dump()
print(user_data)
#> {'id': 123, 'age': 32, 'name': 'John Doe'}
fields_set = original_user.model_fields_set
print(fields_set)
#> {'age', 'id'}
# ...
# pass user_data and fields_set to RPC or save to the database etc.
# ...
# you can then create a new instance of User without
# re-running validation which would be unnecessary at this point:
new_user = User.model_construct(_fields_set=fields_set, **user_data)
print(repr(new_user))
#> User(id=123, age=32, name='John Doe')
print(new_user.model_fields_set)
#> {'age', 'id'}
# construct can be dangerous, only use it with validated data!:
bad_user = User.model_construct(id='dog')
print(repr(bad_user))
#> User(id='dog', name='John Doe')
_fields_set
参数是可选的,它可以准确地区分出传入的和默认赋值的参数;如果它被忽略,__fields_set__
就是传入的所有字段了
For example, in the example above, if _fields_set
was not provided, new_user.model_fields_set
would be {'id', 'age', 'name'}
.
Generic Models 泛型类
为了方便 model 的复用,pydantic 支持创建 generic models
按照以下步骤去创建一个 generic model
- 声明一个或多个
typing.TypeVar
实例,用于参数化你的 model - 声明一个继承自
pydantic.generics.GenericModel
和typing.Generic
的 pydantic model,并在其中传入你刚刚声明的typing.TypeVar
实例作为typing.Generic
的参数 - 在你想要的字段上,使用
typing.TypeVar
实例作为标注
Python 3.12 应该支持更简便的泛型语法,如
class Response[T](BaseModel)
以下是一个可复用的 HTTP 响应负载处理类(easily-reused HTTP response payload wrapper)的例子
from typing import Generic, List, Optional, TypeVar
from pydantic import BaseModel, ValidationError
DataT = TypeVar('DataT')
class DataModel(BaseModel):
numbers: List[int]
people: List[str]
class Response(BaseModel, Generic[DataT]):
data: Optional[DataT] = None
print(Response[int](data=1))
#> data=1
print(Response[str](data='value'))
#> data='value'
print(Response[str](data='value').model_dump())
#> {'data': 'value'}
data = DataModel(numbers=[1, 2, 3], people=[])
print(Response[DataModel](data=data).model_dump())
#> {'data': {'numbers': [1, 2, 3], 'people': []}}
try:
Response[int](data='value')
except ValidationError as e:
print(e)
"""
1 validation error for Response[int]
data
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='value', input_type=str]
"""
如果在 generic model 定义时设置了 model_config,或者用了 @field_validator 以及其他 Pydantic 装饰器,它们也需要被泛型化
如果要继承一个 GenericModel 并且不替换 TypeVar
实例的话,这个字类也必须继承一个 typing.Generic
from typing import Generic, TypeVar
from pydantic import BaseModel
TypeX = TypeVar('TypeX')
class BaseClass(BaseModel, Generic[TypeX]):
X: TypeX
class ChildClass(BaseClass[TypeX], Generic[TypeX]):
# Inherit from Generic[TypeX]
pass
# Replace TypeX by int
print(ChildClass[int](X=1))
#> X=1
当然,可以部分继承父类的 TypeVar
实例,也可以新增 TypeVar
实例
from typing import Generic, TypeVar
from pydantic import BaseModel
TypeX = TypeVar('TypeX')
TypeY = TypeVar('TypeY')
TypeZ = TypeVar('TypeZ')
class BaseClass(BaseModel, Generic[TypeX, TypeY]):
x: TypeX
y: TypeY
class ChildClass(BaseClass[int, TypeY], Generic[TypeY, TypeZ]):
z: TypeZ
# Replace TypeY by str
print(ChildClass[str, int](x='1', y='y', z='3'))
#> x=1 y='y' z=3
甚至可以自定义泛型实例化后的名字 QAQ
from typing import Any, Generic, Tuple, Type, TypeVar
from pydantic import BaseModel
DataT = TypeVar('DataT')
class Response(BaseModel, Generic[DataT]):
data: DataT
@classmethod
def model_parametrized_name(cls, params: Tuple[Type[Any], ...]) -> str:
return f'{params[0].__name__.title()}Response'
print(repr(Response[int](data=1)))
#> IntResponse(data=1)
print(repr(Response[str](data='a')))
#> StrResponse(data='a')
当然,可以将泛型 model 嵌套进其他 model 中去
from typing import Generic, TypeVar
from pydantic import BaseModel
T = TypeVar('T')
class ResponseModel(BaseModel, Generic[T]):
content: T
class Product(BaseModel):
name: str
price: float
class Order(BaseModel):
id: int
product: ResponseModel[Product]
product = Product(name='Apple', price=0.5)
response = ResponseModel[Product](content=product)
order = Order(id=1, product=response)
print(repr(order))
"""
Order(id=1, product=ResponseModel[Product](content=Product(name='Apple', price=0.5)))
"""
在嵌套 model 中使用相同的 TypeVar
实例可以保证一致性
from typing import Generic, TypeVar
from pydantic import BaseModel, ValidationError
T = TypeVar('T')
class InnerT(BaseModel, Generic[T]):
inner: T
class OuterT(BaseModel, Generic[T]):
outer: T
nested: InnerT[T]
nested = InnerT[int](inner=1)
print(OuterT[int](outer=1, nested=nested))
#> outer=1 nested=InnerT[int](inner=1)
try:
nested = InnerT[str](inner='a')
print(OuterT[int](outer='a', nested=nested))
except ValidationError as e:
print(e)
"""
2 validation errors for OuterT[int]
outer
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
nested
Input should be a valid dictionary or instance of InnerT[int] [type=model_type, input_value=InnerT[str](inner='a'), input_type=InnerT[str]]
"""
与那些内置类型(如 List
和 Dict
)类似,GenericModel 的行为会被以下规定所定义:
- 如果在实例化(instantiate) generic model 之前没有被指定类型(specify parameters)之前,传参会被视为
Any
类型 - 如果
TypeVar
未设置任何上限(bounds),传参会被视为Any
类型
类似于 List
和 Dict
,任何使用 TypeVar
传入 model 的参数都能被更具体的类型替换
from typing import Generic, TypeVar
from pydantic import BaseModel, ValidationError
AT = TypeVar('AT')
BT = TypeVar('BT')
class Model(BaseModel, Generic[AT, BT]):
a: AT
b: BT
print(Model(a='a', b='a'))
#> a='a' b='a'
IntT = TypeVar('IntT', bound=int)
typevar_model = Model[int, IntT]
print(typevar_model(a=1, b=1))
#> a=1 b=1
try:
typevar_model(a='a', b='a')
except ValidationError as exc:
print(exc)
"""
2 validation errors for Model[int, TypeVar]
a
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
b
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
"""
concrete_model = typevar_model[int]
print(concrete_model(a=1, b=1))
#> a=1 b=1
动态定义 model
有些时候,model 的定义直到程序运行时才会完整,pydantic 提供 create_model
方法,允许 model 在运行时被定义
from pydantic import BaseModel, create_model
DynamicFoobarModel = create_model('DynamicFoobarModel', foo=(str, ...), bar=123)
# same as below
class StaticFoobarModel(BaseModel):
foo: str
bar: int = 123
StaticFoobarModel
和 DynamicFoobarModel
在这里是完全一样的
动态 model 字段的定义,以下几种均可:
(<type>, <default value>)
的元组格式(<type>, Field(...))
的元组格式typing.Annotated[<type>, Field(...)]
__config__
和 __base__
这两个特殊参数可以用来自定义 model,比如添加额外字段来扩展已有的 model 之类的
from pydantic import BaseModel, create_model
class FooModel(BaseModel):
foo: str
bar: int = 123
BarModel = create_model(
'BarModel',
apple=(str, 'russet'),
banana=(str, 'yellow'),
__base__=FooModel,
)
print(BarModel)
#> <class '__main__.BarModel'>
print(BarModel.model_fields.keys())
#> dict_keys(['foo', 'bar', 'apple', 'banana'])
你甚至还能在 __validators__
参数中传入字典来实现 validator 的功能(震惊)
from pydantic import ValidationError, create_model, field_validator
def username_alphanumeric(cls, v):
assert v.isalnum(), 'must be alphanumeric'
return v
validators = {
'username_validator': field_validator('username')(username_alphanumeric)
# ...
}
UserModel = create_model(
'UserModel', username=(str, ...), __validators__=validators
)
user = UserModel(username='scolvin')
print(user)
#> username='scolvin'
try:
UserModel(username='scolvi%n')
except ValidationError as e:
print(e)
"""
1 validation error for UserModel
username
Assertion failed, must be alphanumeric [type=assertion_error, input_value='scolvi%n', input_type=str]
"""
自定义根类型与 RootModel
通过继承 RootModel
字段,Pydantic models 可以被定义为一个根类型(custom root type)
根类型可以是任何 pydantic 支持的类型,被 RootModel
的类型所指定
from typing import Dict, List
from pydantic import RootModel
Pets = RootModel[List[str]]
PetsByName = RootModel[Dict[str, str]]
print(Pets(['dog', 'cat']))
#> root=['dog', 'cat']
print(Pets(['dog', 'cat']).model_dump_json())
#> ["dog","cat"]
print(Pets.model_validate(['dog', 'cat']))
#> root=['dog', 'cat']
print(Pets.model_json_schema())
"""
{'items': {'type': 'string'}, 'title': 'RootModel[List[str]]', 'type': 'array'}
"""
print(PetsByName({'Otis': 'dog', 'Milo': 'cat'}))
#> root={'Otis': 'dog', 'Milo': 'cat'}
print(PetsByName({'Otis': 'dog', 'Milo': 'cat'}).model_dump_json())
#> {"Otis":"dog","Milo":"cat"}
print(PetsByName.model_validate({'Otis': 'dog', 'Milo': 'cat'}))
#> root={'Otis': 'dog', 'Milo': 'cat'}
"""